{
"cells": [
{
"cell_type": "markdown",
"id": "56362576",
"metadata": {},
"source": [
"# Homework 2\n",
"\n",
"Due: 11:59PM Eastern September 28th\n",
"\n",
"Submit via a direct message to the TA on slack\n",
"\n",
"In this homework we'll explore decision trees and overfitting, and learn about the right way to evaluate the performance of a classifier.\n",
"\n",
"The cell below imports the Python packages that you'll need, including scikit-learn, which conatains implementations of many learning algorithms and supporting infrastructure."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "d5cb5b7d",
"metadata": {},
"outputs": [],
"source": [
"from sklearn import datasets\n",
"from sklearn import tree\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.metrics import accuracy_score\n",
"import numpy as np\n",
"import random\n",
"from sklearn.tree import export_text\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"id": "12ea440c",
"metadata": {},
"source": [
"The cell below implements a simple dataset generator that we'll use to explore the impact of various features of datasets that may lead to overfitting."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "4f705984",
"metadata": {},
"outputs": [],
"source": [
"def make_dataset(n, d = 4, p = 0):\n",
" \"\"\"\n",
" Create a dataset with boolean features and a binary class label.\n",
" The label is assigned as x1 ^ x2 V x3 ^ x4.\n",
" \n",
" Arguments:\n",
" n - The number of instances to generate\n",
" m - The number of features per instance. Any features beyond the first four\n",
" are irrelevant to determining the class label.\n",
" p - The probability that the true class label as computed by the expression\n",
" above is flipped. Said differently, this is the probability of class noise.\n",
" \"\"\"\n",
" \n",
" assert d >= 4, 'The dataset must have at least 4 features'\n",
" X = [np.random.randint(2, size = d) for _ in range(n)]\n",
" y = [(x[0] and x[1]) or (x[2] and x[3]) for x in X]\n",
" y = [v if random.random() >= p else (v + 1) % 2 for v in y]\n",
" return X, y"
]
},
{
"cell_type": "markdown",
"id": "e655d42d",
"metadata": {},
"source": [
"When evaluating the accuracy of a classifier, the right way to do it is to have a test set of instances that were not used to train the classifier and measure on those instances. The [train_test_split()](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) function in scikit makes it easy to create training and testing sets. Below is an example that shows overfitting as evidenced by higher accuracy on the training set than the testing set."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c9296ffb",
"metadata": {},
"outputs": [],
"source": [
"# Create a dataset with 1000 instances, each with 10 attributes, and 10% class noise\n",
"X, y = make_dataset(1000, d = 10, p = 0.1)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "68be6e50",
"metadata": {},
"outputs": [],
"source": [
"# Make training and testing sets, each with half of the data\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, train_size=0.5)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "c9c91fa8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training accuracy: 0.97\n",
"Testing accuracy: 0.84\n"
]
}
],
"source": [
"# Train the classifier and evaluate it on train/test splits\n",
"clf = tree.DecisionTreeClassifier()\n",
"clf.fit(X_train, y_train)\n",
"print('Training accuracy: %.2f' % accuracy_score(y_train, clf.predict(X_train)))\n",
"print('Testing accuracy: %.2f' % accuracy_score(y_test, clf.predict(X_test)))"
]
},
{
"cell_type": "markdown",
"id": "4b80c59b",
"metadata": {},
"source": [
"Note that if the training set has 0% class noise, we get a perfect tree. Spend some time convincing yourself that the tree below captures the boolean expression that assigns class labels."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "119f8f27",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"|--- feature_0 <= 0.50\n",
"| |--- feature_3 <= 0.50\n",
"| | |--- class: 0\n",
"| |--- feature_3 > 0.50\n",
"| | |--- feature_2 <= 0.50\n",
"| | | |--- class: 0\n",
"| | |--- feature_2 > 0.50\n",
"| | | |--- class: 1\n",
"|--- feature_0 > 0.50\n",
"| |--- feature_1 <= 0.50\n",
"| | |--- feature_3 <= 0.50\n",
"| | | |--- class: 0\n",
"| | |--- feature_3 > 0.50\n",
"| | | |--- feature_2 <= 0.50\n",
"| | | | |--- class: 0\n",
"| | | |--- feature_2 > 0.50\n",
"| | | | |--- class: 1\n",
"| |--- feature_1 > 0.50\n",
"| | |--- class: 1\n",
"\n"
]
}
],
"source": [
"X, y = make_dataset(1000, d = 10, p = 0.0)\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, train_size=0.5)\n",
"clf = tree.DecisionTreeClassifier()\n",
"clf.fit(X_train, y_train)\n",
"print(export_text(clf))"
]
},
{
"cell_type": "markdown",
"id": "6484eb4e",
"metadata": {},
"source": [
"# Assignment\n",
"\n",
"Explore the impact of the following on the extent of overfitting:\n",
"* The size of the dataset (n in the call to make_dataset)\n",
"* The number of irrelevant features (d in the call to make_dataset)\n",
"* The probability of class noise (p in the call to make_dataset)\n",
"* The minimum number of samples required for a node to be split. That is the min_samples_split parameter to the [DecisionTreeClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier) constructor"
]
},
{
"cell_type": "markdown",
"id": "f89da077",
"metadata": {},
"source": [
"## What to turn in \n",
" \n",
"For each of the parameters mentioned above, vary the value of the parameter and build learning curves for training and testing accuracy, and plot them. That is, the value of the parameter will be on the horizontal axis and accuracy will be on the vertical axis. For each of the parameters write up an explanation for the impact the parameter has on overfitting. Does its value impact overfitting? How significant is the effect? Why do you think that parameter has the observed effect? Also, in each case, display at least one decision tree and explain what is happening that is making it overfit."
]
},
{
"cell_type": "markdown",
"id": "2c30ebe7",
"metadata": {},
"source": [
"Here is an example of generating a learning curve for a fixed size dataset where the fraction of instances used for training is varied. You can use this template to create your own learning curves. NOTE: This example varies the size of the training set. You can use this example and modify it for each of the parameters above, where instead of varying the size of the training set you'll vary the parameter's values."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "606a3a3e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD4CAYAAADiry33AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAktUlEQVR4nO3deZRU9Z338fe3q/eFbmhWaRBUZJEtsUXjktEkKLgbcXdmjjOG+JyY8clzzKjJY2aSzOJM5skkzpgYkuPjyWPUuISIBiOJ0ZDEDYjNKkgLKA2yQzcN9Fb1e/743W6qm2q6Gqq6qm9/XufUoaru/VV/q+j+3Ht/91e/a845REQkvHIyXYCIiKSXgl5EJOQU9CIiIaegFxEJOQW9iEjI5Wa6gESGDh3qxo0bl+kyRET6jRUrVuxxzg1LtCwrg37cuHEsX74802WIiPQbZvZhd8vUdSMiEnIKehGRkFPQi4iEnIJeRCTkFPQiIiHXY9Cb2WNmtsvM1nSz3MzsYTOrNbNVZvbJuGVzzGxDsOz+VBYuIiLJSWaP/nFgznGWzwUmBLf5wA8BzCwCPBIsnwLcYmZTTqZYERHpvR7H0TvnlprZuOOscg3wU+fnO37LzCrMbBQwDqh1zm0CMLOng3XXnXTV3Xj41Y2UFOQyYlABIwcVMmJQIcMHFVCQG0nXjxQRyXqp+MLUaGBr3OO64LlEz5/b3YuY2Xz8EQFjx47tdRGxmGPB0k00Nrcds2xIST7DywoYWV7IyEGFDB9UGGwIChgxqJCR5YUMKc4nJ8d6/XNFRLJdKoI+UTq64zyfkHNuAbAAoLq6utdXQ8nJMVb/46XUH2llZ0MzOxqa2NnQxM76puB+Mzsbmli7vYE9jc10vd5KXsQYXuaPANqPBvxGoIARZYWMCDYSJQVZ+WViEZFupSK16oAxcY+rgO1AfjfPp42ZUVGcT0VxPhNHlnW7Xls0xu7GZr9BqA82CA1+g7CroZmNuxr548Y9HExwdFAadA2NaD8qKC9kRHC00L5xGFZWQF5EA5pEJDukIugXAXcHffDnAvXOuY/NbDcwwczGA9uAm4FbU/DzTlpuJIdR5UWMKi/qvCnq4lBzW7ARaO7YEHRsFOqbeHvzPnYdbKI12vnwwAwqSwoYWV7QbVfRiLJCKorzMFN3kYikV49Bb2ZPARcDQ82sDvgHIA/AOfcosBi4HKgFDgN3BMvazOxu4BUgAjzmnFubhveQNiUFuZw2rJTThpV2u04s5th/uCVuI9D5KGHbgSbe/egAew+1HNM2Pzen04nj9qOE9u6j9qOEwjydTBaRE2fZeHHw6upqF7bZK5vbouxqaGbXwSZ21Dd36i6K30AcaY0e07a8KK9zd1GXLqORgwqpLC0gopPJIgOWma1wzlUnWqYzi32kIDfCmCHFjBlS3O06zjkONrexs76p8wnloKto58FmNu7cw+7GZqKxzhvoSI4xrLSgc/dQx5FCQcf5hLKCXHUXiQwwCvosYmYMKsxjUGEeE0Z0fzI5GnPsbWzuGE20Ixhd1H6EsGXvId7evI/6I63HtC3KizCyvLBjuOnpw0qZXlXOjKoKBpfkp/PtiUiGKOj7oUiOMTw4yXs8R1qiQVfR0RFF8UcJKz7cz6KV2zuGmo4ZUsSMqgpmVFUwvaqcqaPLNZxUJAT0VxxiRfkRTq0s4dTKkm7XOdjUyupt9ayqq2fl1gO8+9EBXlr1MQA5BmcML/XBP6aCGVXlTBo5iPxcDR0V6U8U9ANcWWEe558+lPNPH9rx3J7GZlbVHWDl1npW1h3g1fW7eHZFHQD5kRwmjypjxpgKplf58D9tWKlOBItkMY26kR4556jbf4RVdfV+A1B3gNV19Rxq8SOESvIjTB1dzswg/KdXlVM1uEgnfUX6kEbdyEkxs44RQ1dMHwX4E8KbdjeysiP86/m/f9pCSzQG+PmF2k/yzhhTzvSqCoaWFmTybYgMWAp6OSGRHGPCiDImjChj3tlVALS0xVi/o8GH/9YDrKqrZ+n7G2kfCTq6oojpVT70Z4wpZ9rocsoK8zL4LkQGBgW9pEx+bk7QdVMB550K+Gkk1rSf7K3z4f/ymh2AnyritKElHaN8ZoypYPKoQfomsEiKKeglrUoKcjn3tErOPa2y47l9h1pYFYT+qroDLN24h1+8uw2A3Bxj0qiyjhO9M8ZUcMawUnI1SZzICdPJWMk45xwf1zd19PW3bwQONvnZQ4vyIkwdPajjRO/MMRWMHVKsk70icXQyVrKamXFKRRGnVBQxZ6o/2RuLOTbvPdQxzHNV3QGeeOtDmtv8yd6K4jymjS7v6PaZOaaixy+QiQxU2qOXfqM1GmPDjoNxwzzreX/nwY55f0YOKuzo659eVc700RWUF+tkrwwMx9ujV9BLv3akJcra7fWdunw27znUsXz80JKjI32qyjnrlHKK8nWyV8JHXTcSWkX5EarHDaF63JCO5+oPt7Jq24GOaR3e3rSPF2r8xc0iOcaZI8qYEYT/9KpyJo4s0xXBJNS0Ry8Dws6GJlYGY/vbh3m2z+5ZkJvDlFMGdfpy1/jKEl0sXvoVdd2IdOGc46N9h6nZenSY55ptDR0XfikrzPUne4PJ3KZXVTCqvFAjfSRrqetGpAsz65jZ85qZowF/0fja3Y2s3Hp0mOePl26iLTjZO7S0gDFDisiP5JCfm9Pxb17747jn8iNdn7eO+3mRnGNeo9PrxL9Gx2uZNjJywhT0IoHcSA6TRg5i0shB3HSOf66pNcp7Hzd0dPvsbmymuS3GwaY2WqMxWtpitERjtAb/NrfFOp6PpfhgufNGwTptUAoSbCzycnMoSLAx6lg/0v46kY7XK0jwGvmRLs93eQ11cWU/Bb3IcRTmRfjE2MF8YuzgXrdti8ZojTpa2mI0R6Md99s3BPEbhY7no9083xajORqjtc3REo0Gz7uEr9PY3NaxAer6Gi1BTakUybFORzcFcRujgtwIhXk5FOZFjt5ycyjKP3q/MD9CYa5/XJSf03Hf3462LYp7XJCboyOcXlDQi6RJbiSH3AjBcM7sGc/vnOvYCLRvLDptGKKdNwyJNhbx/8ZvZFraXy84ymlui9LU6o+Adh/0R0NHWqI0tUVpavXLTlR76BcFG4KC9g1I3MalKC9CQbCBKIrbeBx9vuuGx29sCjo2PH55f5+CQ0EvMsCY+T3tgtzMf5/AOUdzW6wj9I+0tm8AohxpjdLcGuu43xR3v7k1SlPQzm84Yh3tmlqj7Glsi3utGM1Bu7YT7E/LzbFOG42uRxjdHa0UxG2I4jc2BZ02PMe+Vqop6EUkY8wsbeGWSFs0RlP7UUVrlOa2KEdaYh1HGPEbjeYEG5j4jUb7RiT+aKWpy7LeqizJZ8WDs1P+vhX0IjJg5EZyKI3kUNoHF71P9milqX1j0xolL5Ke8w4KehGRNOjro5Xj6d9nGEREpEcKehGRkFPQi4iEnIJeRCTkFPQiIiGnoBcRCbmkgt7M5pjZBjOrNbP7EywfbGYLzWyVmb1jZlPjlm0xs9VmVmNmmntYRKSP9TiO3swiwCPAbKAOWGZmi5xz6+JW+xpQ45y7zswmBet/Nm75Jc65PSmsW0REkpTMHv0soNY5t8k51wI8DVzTZZ0pwKsAzrn1wDgzG5HSSkVE5IQkE/Sjga1xj+uC5+KtBD4PYGazgFOBqmCZA5aY2Qozm9/dDzGz+Wa23MyW7969O9n6RUSkB8kEfaLJF7pOAfcQMNjMaoAvA+8CbcGyC5xznwTmAl8ys08n+iHOuQXOuWrnXPWwYcOSKl5ERHqWzFw3dcCYuMdVwPb4FZxzDcAdAOavBrA5uOGc2x78u8vMFuK7gpaedOUiIpKUZPbolwETzGy8meUDNwOL4lcws4pgGcCdwFLnXIOZlZhZWbBOCXApsCZ15YuISE963KN3zrWZ2d3AK0AEeMw5t9bM7gqWPwpMBn5qZlFgHfC3QfMRwMLgkl+5wJPOuV+n/m2IiEh3zLkUX8E4Baqrq93y5RpyLyKSLDNb4ZyrTrRM34wVEQk5Bb2ISMgp6EVEQk5BLyIScgp6EZGQU9CLiIScgl5EJOQU9CIiIaegFxEJOQW9iEjIKehFREJOQS8iEnIKehGRkFPQi4iEnIJeRCTkFPQiIiGnoBcRCTkFvYhIyCnoRURCTkEvIhJyCnoRkZBT0IuIhJyCXkQk5BT0IiIhp6AXEQk5Bb2ISMgp6EVEQk5BLyIScgp6EZGQU9CLiIRcUkFvZnPMbIOZ1ZrZ/QmWDzazhWa2yszeMbOpybYVEZH06jHozSwCPALMBaYAt5jZlC6rfQ2occ5NB/4K+H4v2oqISBols0c/C6h1zm1yzrUATwPXdFlnCvAqgHNuPTDOzEYk2VZERNIomaAfDWyNe1wXPBdvJfB5ADObBZwKVCXZlqDdfDNbbmbLd+/enVz1IiLSo2SC3hI857o8fggYbGY1wJeBd4G2JNv6J51b4Jyrds5VDxs2LImyREQkGblJrFMHjIl7XAVsj1/BOdcA3AFgZgZsDm7FPbUVEZH0SmaPfhkwwczGm1k+cDOwKH4FM6sIlgHcCSwNwr/HtiIikl497tE759rM7G7gFSACPOacW2tmdwXLHwUmAz81syiwDvjb47VNz1sREZFEzLmEXeYZVV1d7ZYvX57pMkRE+g0zW+Gcq060TN+MFREJOQW9iEjIKehFREJOQS8iEnIKehGRkFPQi4iEnIJeRCTkFPQiIiGnoBcRCTkFvYhIyCnoRURCTkEvIhJyCnoRkZBT0IuIhJyCXkQk5BT0IiIhp6AXEQk5Bb2ISMgp6EVEQk5BLyIScgp6EZGQU9CLiIScgl5EJOQU9CIiIaegFxEJOQW9iEjIKehFREJOQS8iEnIKehGRkFPQi4iEnIJeRCTkkgp6M5tjZhvMrNbM7k+wvNzMXjSzlWa21szuiFu2xcxWm1mNmS1PZfEiItKz3J5WMLMI8AgwG6gDlpnZIufcurjVvgSsc85dZWbDgA1m9jPnXEuw/BLn3J5UFy8iIj3rMeiBWUCtc24TgJk9DVwDxAe9A8rMzIBSYB/QluJaRSRdYlFoPQwth6H1ELQeCe4Ht5bguW7vHwrWP9K5fdlImHEzTL0eiodk+l0OWMkE/Whga9zjOuDcLuv8N7AI2A6UATc552LBMgcsMTMH/Mg5tyDRDzGz+cB8gLFjxyb9BkQGhFi0+1DtFMjd3D/espbDEG3uZUEGecWQX+z/jb9fOvLo/R2rYPG98MrX4Mw5MPM2OONzEEkmeiRVkvm0LcFzrsvjy4Aa4DPA6cBvzOwPzrkG4ALn3HYzGx48v945t/SYF/QbgAUA1dXVXV9fJLtF2xLv/SYM2Pg95vb7h4J1Et0/cmJBnF8ShHBRcL/IPy4bFdwvCQK5u/vFXdrHBXpuIViiaEjg41Ww8ilY9Qy8twhKhsP0G2HmrTDirF5/1NJ7yQR9HTAm7nEVfs893h3AQ845B9Sa2WZgEvCOc247gHNul5ktxHcFHRP0IlkrFoXNS2HNc7CnNnGAR1t6fp14lnPsnnD7/UGnJHg+Lqg7lgXPdbofBHJuQfJBnG6jpvvb7G/Bxt9Azc/g7R/Bm/8No2b4vfyp86CkMtOVhlYyQb8MmGBm44FtwM3ArV3W+Qj4LPAHMxsBTAQ2mVkJkOOcOxjcvxT4VsqqF0kX52D7n2H1c7DmeWjcCQWDfDAVje6899tpjzfB3m+i+9kUxH0lkgeTLve3Q3v9hrPmZ/Dy38MrX4czL/OhP2G2X1dSpsegd861mdndwCtABHjMObfWzO4Klj8KfBt43MxW47t67nPO7TGz04CF/hwtucCTzrlfp+m9iJy8PbWw+ll/2/cBRPJ9AE27ASZcBnmFma4wHEoq4dwv+tuONUHXzs9h/UtQPBSm3wQzb4GR0zJdaSiY723JLtXV1W75cg25lz5ycAes+QWsfga2vwsYjL8Ipt0Ik6+CoopMVzgwRFuh9lW/l7/hZYi1+qCfeZvf0JYMzXSFWc3MVjjnqhMuU9DLgNRUD++96PfcNy8FF4NRM/1JwrM+D4NGZbrCge3wPt9lVvMzv/HNyfWjdmbcAhMuhdz8TFeYdRT0IgCtTbBxiQ/391/xI1kGj/fhPu0GGDoh0xVKIjvXwconYeXP4dAuKK70R1szb/UneQVQ0MtAFovClj/6bpl1L0JzvR/eN/V6H+6jPznwTor2V9E2+OB3QdfOYj/SacRUH/jTboTSYZmuMKMU9DKwOAcf18CqZ4MRMzsgv8z3t0+/AcZ9Wl/Y6e/au3ZWPgXbVviunQmX+tCfcNmA7No5XtDrt13CY+8Hfjjk6mdh70Y/YmbCpX7P/czL/NBHCYfiITDrC/62a/3Rrp0Ni6FoiP8/n3mrHw6rIzbt0Us/d3AnrP2FD/dtKwCDcRf6P/QpV0PR4ExXKH0l2gabXoOaJ2H9r/w5mOFTjnbtlI3IdIVppa4bCZemhrgRM7/3I2ZGTj86YqZ8dKYrlEw7st8PmV35FNQtA4v4L2LNvNWP3sktyHSFKaegl/6vrdl/fX71M7Dh18GImXF+T23aPBg2MdMVSrba/X7QtfM0HPzYH+VNnedD/5RPhKZrR0Ev/VMsCh/+ye+5r3vBj30vGeb32qffCKPPDs0fqfSBWLRz105bEwyb7L+BO/0mP6VyP6agl/7DOfh4pQ/3Nb+Ag9shv9SPmJk2D8ZfrBEzcvKOHIC1C33o173jJ5k743NB187cfjnVhYJest++TUdHzOx5H3LyghEz83yfan5xpiuUsNqz0fflr3waGrZBYXnQtXNbv/qehYJeslPjLr9XteoZ2Bb8f596oR/rPvlqXZFI+lYs6k/u1zzpT/a3NcHQiX4vf/pNWT8thoJeskdTg+8fXf0MbHo9GDEzzQ+HnHo9lFdlukIRfz5o7S996G99y3ftnP4ZH/oTr8jKrh0FvWRWWwvU/sZ3y2x42e8pVZzqw33aDTB8UqYrFOne3g98107NU9BQBwXlMPXzvmunqjprunYU9NL3YrEuI2YO+HnGp37eh3vVOVnzByKSlFgMtiz1e/nrFkHbEaic4PfyZ9zsrwyWQQp66RvOwY7Vvltm9fNHR8xMutKH+2l/oSsHSTg0NfgdmJon4aM3AIPTL/F7+ZOuyMh0Gwp6Sa99m/1l4VY9C3s2+AmmzpjtT6qeOVcjZiTc9m3yI3ZqnoL6j/wlJ8+6zof+mFl9duSqoJfUa9ztR8ysftaPQwY49QI/HHLKtRoxIwNPLAYf/jHo2nnBXzR+yOlHu3bSPNBAQS+p0XwwGDHzLHzwGrionw+8fcRMxZhMVyiSHZoPBl07T/nwx3zX5czbfFdmGo5yFfRy4tpa4INX/Vj3DS/7E1DlY/2e+7QbYMSUTFcokt32bfZdOyufhAMf+WsjnHWtD/2x56Wsa0dBL70Ti8FHbwYjZn7pZwIsrvT9jtNu7NN+R5HQiMX8iduaJ/0Y/dZDMOQ0mBF07ZzkEbGCXnrmHOxc48N99fN+vHBeiR9BMO0GP6JAI2ZEUqO5Ed5b5EN/yx8Ag/EX+b38qfNOaD4nBb10b/+HQbg/B7vfC0bMfM6H+8S5kF+S6QpFwm3/h0e7dmJRuGcV5OT0+mUU9HKsPbWw+F4/bSvA2E8FV2W6FkoqM1qayIDknJ9U7QRH5+iasXJUtBXe+C94/SHILYTPPOjndq8Ym+nKRAY2s7QNwVTQDyTba2DR3f7bq5Ovhsu/0+8vtiAiPVPQDwQth+H1f4U3H4GSoXDTE/5CHiIyICjow27zUlj0d7B/M3zyr2D2t6GoItNViUgfUtCH1ZED8JsH4c8/hcHj4a8W+W/miciAo6APo/dehF/dC4d2wfl/Bxc/oInFRAawpAZrmtkcM9tgZrVmdn+C5eVm9qKZrTSztWZ2R7JtJYUO7oSf/yX8/HYoGQZf+B1c+m2FvMgA1+MevZlFgEeA2UAdsMzMFjnn1sWt9iVgnXPuKjMbBmwws58B0STayslyDt59ApZ8HVqb4LPf8Hvy+iariJBc180soNY5twnAzJ4GrgHiw9oBZWZmQCmwD2gDzk2irZyMfZvhxXv8RY3Hng9XPwxDJ2S6KhHJIskE/Whga9zjOnyAx/tvYBGwHSgDbnLOxcwsmbYAmNl8YD7A2LH68k6Pom3w9g/hd//spy244rtw9h0n9NVpEQm3ZII+0TSFXedNuAyoAT4DnA78xsz+kGRb/6RzC4AF4KdASKKugWvHGv/Fp+3v+is4XfF/oHx0pqsSkSyVTNDXAfHzZ1bh99zj3QE85PzEObVmthmYlGRbSVZrEyz9Dvzpe1BYAfMeg7M+rymDReS4kgn6ZcAEMxsPbANuBm7tss5HwGeBP5jZCGAisAk4kERbScaHb8KiL8PejTDjFrjsX3S5PhFJSo9B75xrM7O7gVeACPCYc26tmd0VLH8U+DbwuJmtxnfX3Oec2wOQqG163kpINTXAq9+EZT/xV3a6/Xk/jbCISJI0TXE2e/8VeOkr0LAdzvsfcMnXoaA001WJSBbSNMX9TeNu+PV9sOZ5GDYZ7vwpVCX8/xMR6ZGCPps4B6t+Dr9+wF9F/uKvwYVfgdz8TFcmIv2Ygj5bHPjId9PU/haqZsHV/wXDJ2W6KhEJAQV9psWi8M6P4dVv+cdz/x3OuRNyIpmtS0RCQ0GfSbvW+y8+1S3zI2mu/E9d0k9EUk5BnwltLfDH78LS/4CCMrhugb9uq774JCJpoKDva1uX+S8+7X4Pps6Duf/mL+8nIieltbWVuro6mpqaMl1KWhUWFlJVVUVeXvKz0yro+0pzI/zun+DtR2HQKXDrM3DmZZmuSiQ06urqKCsrY9y4cVhIj46dc+zdu5e6ujrGjx+fdDsFfV+o/S28+BWo/wjO+YKfL75wUKarEgmVpqamUIc8gJlRWVnJ7t27e9VOQZ9Oh/f5MfGrnoahZ8LfvAJjz8t0VSKhFeaQb3ci71FBnw7OwdpfwOK/h6YDcNG98OmvQl5hpisTkQFIV6lItfpt8NQt8Nzf+KGS838Pn31QIS8ScgcOHOAHP/hBr9tdfvnlHDhwIPUFxVHQp0os5meYfORc2PQ6XPrPcOdvYeTUTFcmIn2gu6CPRqPHbbd48WIqKirSVJWnrptU2LMRFv0dfPQGjP8LuOr7MCT5M+IiklrffHEt67Y3pPQ1p5wyiH+46qxul99///188MEHzJw5k7y8PEpLSxk1ahQ1NTWsW7eOa6+9lq1bt9LU1MQ999zD/PnzARg3bhzLly+nsbGRuXPncuGFF/LGG28wevRoXnjhBYqKik66dgX9yYi2wp++D7//d981c80jMPM2ffFJZAB66KGHWLNmDTU1Nbz++utcccUVrFmzpmMY5GOPPcaQIUM4cuQI55xzDtdffz2VlZWdXmPjxo089dRT/PjHP+bGG2/k+eef5/bbbz/p2hT0J2rbn/1e/M7VMOUamPsdKBuR6apEBI67591XZs2a1Wms+8MPP8zChQsB2Lp1Kxs3bjwm6MePH8/MmTMBOPvss9myZUtKalHQ91bLYXj9X+DNR6BkONz0M5h8ZaarEpEsU1JS0nH/9ddf57e//S1vvvkmxcXFXHzxxQm/wVtQUNBxPxKJcOTIkZTUoqDvjU2/hxfvgf2b4ZN/DbO/BUUVma5KRLJAWVkZBw8eTLisvr6ewYMHU1xczPr163nrrbf6tDYFfTKO7IclD8K7/w+GnAZ//RKMvyjTVYlIFqmsrOSCCy5g6tSpFBUVMWLE0a7cOXPm8OijjzJ9+nQmTpzIeef17Rcndc3YnqxbBIvvhUN74Py74eIHIO/kz4KLSGq99957TJ48OdNl9IlE71XXjD0RB3f4gH/vRRg5zU9CdsrMTFclItJrCvqunIM//9R31USb4XP/CJ+6GyLJTwkqIpJNFPTx9n7gT7Zu+QOceiFc/TBUnp7pqkREToqCHiDaBm89Aq/9q99zv/J7flRNjmaIEJH+T0H/8Sp/xaePa2DiFXDFf/gLg4iIhMTADfrWJvj9v/kpDIqHwA2Pw5RrNX2BiITOwOyb2PInePQCf4HuGTfDl96Bs65TyIvICTvRaYoBvve973H48OEUV3TUwAr6pgZ46Svw+OUQbYG/XAjX/sDv0YuInIRsDvqB03Wz4WV46X9B4w4470vwma9DfknP7USk/3n5ftixOrWvOXIazH2o28Xx0xTPnj2b4cOH88wzz9Dc3Mx1113HN7/5TQ4dOsSNN95IXV0d0WiUBx98kJ07d7J9+3YuueQShg4dymuvvZbauhkIQd+4G17+e39pv+FT4KYnoOrsTFclIiETP03xkiVLeO6553jnnXdwznH11VezdOlSdu/ezSmnnMKvfvUrwM+BU15ezne/+11ee+01hg4dmpbawhv0zsHKp+GVB6DlEFzyv+GCeyA3P9OViUi6HWfPuy8sWbKEJUuW8IlPfAKAxsZGNm7cyEUXXcS9997Lfffdx5VXXslFF/XNnFlJBb2ZzQG+D0SAnzjnHuqy/KvAbXGvORkY5pzbZ2ZbgINAFGjrbi6GlNr/Ibz0P+GD38GYc+Hq/4JhE9P+Y0VEAJxzPPDAA3zxi188ZtmKFStYvHgxDzzwAJdeeinf+MY30l5Pj0FvZhHgEWA2UAcsM7NFzrl17es4574DfCdY/yrgK865fXEvc4lzbk9KK08kFoV3FsCr3/YjaOZ+B865U198EpG0i5+m+LLLLuPBBx/ktttuo7S0lG3btpGXl0dbWxtDhgzh9ttvp7S0lMcff7xT20x23cwCap1zmwDM7GngGmBdN+vfAjyVmvJ64ch+eGIebFsOZ8yGK/8TKsb0eRkiMjDFT1M8d+5cbr31Vj71qU8BUFpayhNPPEFtbS1f/epXycnJIS8vjx/+8IcAzJ8/n7lz5zJq1Ki0nIztcZpiM5sHzHHO3Rk8/kvgXOfc3QnWLcbv9Z/RvkdvZpuB/YADfuScW9DNz5kPzAcYO3bs2R9++GHv3olz8IsvwITLYNo8jYkXGWA0TfHJTVOcKDG72zpcBfypS7fNBc657WY2HPiNma13zi095gX9BmAB+Pnok6irS5UG1/+k181ERMIumc7rOiC+D6QK2N7NujfTpdvGObc9+HcXsBDfFSQiIn0kmaBfBkwws/Fmlo8P80VdVzKzcuAvgBfinisxs7L2+8ClwJpUFC4i0lU2XjEv1U7kPfbYdeOcazOzu4FX8MMrH3POrTWzu4LljwarXgcscc4dims+Alhovr88F3jSOffrXlcpItKDwsJC9u7dS2VlJRbSc3TOOfbu3UthYWGv2umasSISCq2trdTV1dHU1JTpUtKqsLCQqqoq8vI6X/VO14wVkdDLy8tj/PjxmS4jK+mbRCIiIaegFxEJOQW9iEjIZeXJWDPbDfTyq7EdhgLpn1en91RX76iu3lFdvRPGuk51zg1LtCArg/5kmNnyPpkhs5dUV++ort5RXb0z0OpS142ISMgp6EVEQi6MQZ9wdswsoLp6R3X1jurqnQFVV+j66EVEpLMw7tGLiEgcBb2ISMj1m6A3szlmtsHMas3s/gTLJ5nZm2bWbGb39qZtBuvaYmarzazGzFI6i1sSdd1mZquC2xtmNiPZthmsK5Of1zVBTTVmttzMLky2bQbrStvnlUxtceudY2bR4Gp1vWqbgboy+Tt2sZnVBz+7xsy+kWzbHjnnsv6Gnx75A+A0IB9YCUzpss5w4Bzgn4F7e9M2E3UFy7YAQzP0eZ0PDA7uzwXezpLPK2FdWfB5lXL0nNZ0YH2WfF4J60rn59Wb9x2s9ztgMTAvGz6z7urKgt+xi4GXTvQ9He/WX/boOy5Q7pxrAdovUN7BObfLObcMaO1t2wzVlU7J1PWGc25/8PAt/JXDkmqbobrSKZm6Gl3wVweUcPRympn+vLqrK92Sfd9fBp4Hdp1A276uK51O5j2f9OfVX4J+NLA17nFd8Fy626b7tR2wxMxWmL84eqr0tq6/BV4+wbZ9VRdk+PMys+vMbD3wK+BvetM2A3VB+j6vpGozs9H4CxI9SmcZ/cyOUxdk/m/yU2a20sxeNrOzetm2W/1lPvreXKA8lW3T/dpJXTg9nXWZ2SX4QG3v282KzytBXZDhz8s5txB/xbRPA98GPpds2wzUBen7vJKt7XvAfc65qHW+4lOmP7Pu6oLM/o79GT9fTaOZXQ78EpiQZNvj6i979L25QHkq26b1tV36LpyeVF1mNh34CXCNc25vb9pmoK6Mf15xdSwFTjezob1t24d1pfPzSra2auBpM9sCzAN+YGbXJtk2E3Vl9HfMOdfgnGsM7i8G8lL2O5bqkw7puOGPPDYB4zl6MuKsbtb9RzqfjE26bR/XVQKUxd1/A5jTV3UBY4Fa4PwTfU99XFemP68zOHrS85PANvyeVqY/r+7qStvndSK/J8DjHD0ZmxV/kwnqyvTv2Mi4/8tZwEep+h1LyX96X9yAy4H38Wefvx48dxdwV9yHVAc0AAeC+4O6a5vpuvBn0FcGt7UZqOsnwH6gJrgtP17bTNeVBZ/XfcHPrQHeBC7Mks8rYV3p/rySqa3Luo/TeXRLxj6z7urKgt+xu4OfuxI/EOH847XtzU1TIIiIhFx/6aMXEZETpKAXEQk5Bb2ISMgp6EVEQk5BLyIScgp6EZGQU9CLiITc/wfDpCFPAe32ZwAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"X, y = make_dataset(1000, d = 10, p = 0.1)\n",
"\n",
"test_acc = []\n",
"train_acc = []\n",
"frac = [0.1, 0.2, 0.3, 0.4, 0.5]\n",
"for f in frac:\n",
" X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, train_size=f)\n",
" clf = tree.DecisionTreeClassifier()\n",
" clf.fit(X_train, y_train)\n",
" train_acc.append(accuracy_score(y_train, clf.predict(X_train)))\n",
" test_acc.append(accuracy_score(y_test, clf.predict(X_test)))\n",
" \n",
"plt.plot(frac, train_acc, label = 'train')\n",
"plt.plot(frac, test_acc, label = 'test')\n",
"plt.legend()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "0feb7c3f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"|--- feature_0 <= 0.50\n",
"| |--- feature_2 <= 0.50\n",
"| | |--- feature_4 <= 0.50\n",
"| | | |--- feature_1 <= 0.50\n",
"| | | | |--- class: 0\n",
"| | | |--- feature_1 > 0.50\n",
"| | | | |--- feature_8 <= 0.50\n",
"| | | | | |--- feature_5 <= 0.50\n",
"| | | | | | |--- feature_3 <= 0.50\n",
"| | | | | | | |--- feature_9 <= 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | | |--- feature_9 > 0.50\n",
"| | | | | | | | |--- feature_6 <= 0.50\n",
"| | | | | | | | | |--- class: 0\n",
"| | | | | | | | |--- feature_6 > 0.50\n",
"| | | | | | | | | |--- class: 0\n",
"| | | | | | |--- feature_3 > 0.50\n",
"| | | | | | | |--- feature_9 <= 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | | |--- feature_9 > 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | |--- feature_5 > 0.50\n",
"| | | | | | |--- feature_7 <= 0.50\n",
"| | | | | | | |--- class: 0\n",
"| | | | | | |--- feature_7 > 0.50\n",
"| | | | | | | |--- feature_9 <= 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | | |--- feature_9 > 0.50\n",
"| | | | | | | | |--- feature_6 <= 0.50\n",
"| | | | | | | | | |--- feature_3 <= 0.50\n",
"| | | | | | | | | | |--- class: 0\n",
"| | | | | | | | | |--- feature_3 > 0.50\n",
"| | | | | | | | | | |--- class: 0\n",
"| | | | | | | | |--- feature_6 > 0.50\n",
"| | | | | | | | | |--- class: 0\n",
"| | | | |--- feature_8 > 0.50\n",
"| | | | | |--- class: 0\n",
"| | |--- feature_4 > 0.50\n",
"| | | |--- feature_8 <= 0.50\n",
"| | | | |--- feature_3 <= 0.50\n",
"| | | | | |--- feature_5 <= 0.50\n",
"| | | | | | |--- feature_1 <= 0.50\n",
"| | | | | | | |--- feature_9 <= 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | | |--- feature_9 > 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | |--- feature_1 > 0.50\n",
"| | | | | | | |--- class: 0\n",
"| | | | | |--- feature_5 > 0.50\n",
"| | | | | | |--- class: 0\n",
"| | | | |--- feature_3 > 0.50\n",
"| | | | | |--- class: 0\n",
"| | | |--- feature_8 > 0.50\n",
"| | | | |--- feature_7 <= 0.50\n",
"| | | | | |--- feature_3 <= 0.50\n",
"| | | | | | |--- feature_5 <= 0.50\n",
"| | | | | | | |--- feature_6 <= 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | | |--- feature_6 > 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | |--- feature_5 > 0.50\n",
"| | | | | | | |--- class: 1\n",
"| | | | | |--- feature_3 > 0.50\n",
"| | | | | | |--- feature_9 <= 0.50\n",
"| | | | | | | |--- class: 0\n",
"| | | | | | |--- feature_9 > 0.50\n",
"| | | | | | | |--- feature_6 <= 0.50\n",
"| | | | | | | | |--- class: 1\n",
"| | | | | | | |--- feature_6 > 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | |--- feature_7 > 0.50\n",
"| | | | | |--- feature_9 <= 0.50\n",
"| | | | | | |--- class: 0\n",
"| | | | | |--- feature_9 > 0.50\n",
"| | | | | | |--- feature_5 <= 0.50\n",
"| | | | | | | |--- class: 0\n",
"| | | | | | |--- feature_5 > 0.50\n",
"| | | | | | | |--- feature_1 <= 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | | |--- feature_1 > 0.50\n",
"| | | | | | | | |--- class: 1\n",
"| |--- feature_2 > 0.50\n",
"| | |--- feature_3 <= 0.50\n",
"| | | |--- feature_9 <= 0.50\n",
"| | | | |--- feature_8 <= 0.50\n",
"| | | | | |--- class: 0\n",
"| | | | |--- feature_8 > 0.50\n",
"| | | | | |--- feature_5 <= 0.50\n",
"| | | | | | |--- class: 0\n",
"| | | | | |--- feature_5 > 0.50\n",
"| | | | | | |--- feature_4 <= 0.50\n",
"| | | | | | | |--- class: 1\n",
"| | | | | | |--- feature_4 > 0.50\n",
"| | | | | | | |--- feature_7 <= 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | | |--- feature_7 > 0.50\n",
"| | | | | | | | |--- feature_1 <= 0.50\n",
"| | | | | | | | | |--- feature_6 <= 0.50\n",
"| | | | | | | | | | |--- class: 0\n",
"| | | | | | | | | |--- feature_6 > 0.50\n",
"| | | | | | | | | | |--- class: 0\n",
"| | | | | | | | |--- feature_1 > 0.50\n",
"| | | | | | | | | |--- feature_6 <= 0.50\n",
"| | | | | | | | | | |--- class: 0\n",
"| | | | | | | | | |--- feature_6 > 0.50\n",
"| | | | | | | | | | |--- class: 1\n",
"| | | |--- feature_9 > 0.50\n",
"| | | | |--- class: 0\n",
"| | |--- feature_3 > 0.50\n",
"| | | |--- feature_4 <= 0.50\n",
"| | | | |--- feature_7 <= 0.50\n",
"| | | | | |--- class: 1\n",
"| | | | |--- feature_7 > 0.50\n",
"| | | | | |--- feature_6 <= 0.50\n",
"| | | | | | |--- feature_9 <= 0.50\n",
"| | | | | | | |--- feature_5 <= 0.50\n",
"| | | | | | | | |--- feature_1 <= 0.50\n",
"| | | | | | | | | |--- class: 1\n",
"| | | | | | | | |--- feature_1 > 0.50\n",
"| | | | | | | | | |--- class: 0\n",
"| | | | | | | |--- feature_5 > 0.50\n",
"| | | | | | | | |--- class: 1\n",
"| | | | | | |--- feature_9 > 0.50\n",
"| | | | | | | |--- feature_8 <= 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | | |--- feature_8 > 0.50\n",
"| | | | | | | | |--- class: 1\n",
"| | | | | |--- feature_6 > 0.50\n",
"| | | | | | |--- class: 1\n",
"| | | |--- feature_4 > 0.50\n",
"| | | | |--- feature_5 <= 0.50\n",
"| | | | | |--- feature_9 <= 0.50\n",
"| | | | | | |--- feature_1 <= 0.50\n",
"| | | | | | | |--- class: 0\n",
"| | | | | | |--- feature_1 > 0.50\n",
"| | | | | | | |--- feature_6 <= 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | | |--- feature_6 > 0.50\n",
"| | | | | | | | |--- class: 1\n",
"| | | | | |--- feature_9 > 0.50\n",
"| | | | | | |--- feature_7 <= 0.50\n",
"| | | | | | | |--- feature_1 <= 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | | |--- feature_1 > 0.50\n",
"| | | | | | | | |--- feature_8 <= 0.50\n",
"| | | | | | | | | |--- feature_6 <= 0.50\n",
"| | | | | | | | | | |--- class: 0\n",
"| | | | | | | | | |--- feature_6 > 0.50\n",
"| | | | | | | | | | |--- class: 1\n",
"| | | | | | | | |--- feature_8 > 0.50\n",
"| | | | | | | | | |--- class: 1\n",
"| | | | | | |--- feature_7 > 0.50\n",
"| | | | | | | |--- class: 1\n",
"| | | | |--- feature_5 > 0.50\n",
"| | | | | |--- feature_9 <= 0.50\n",
"| | | | | | |--- class: 1\n",
"| | | | | |--- feature_9 > 0.50\n",
"| | | | | | |--- feature_1 <= 0.50\n",
"| | | | | | | |--- class: 1\n",
"| | | | | | |--- feature_1 > 0.50\n",
"| | | | | | | |--- feature_6 <= 0.50\n",
"| | | | | | | | |--- class: 1\n",
"| | | | | | | |--- feature_6 > 0.50\n",
"| | | | | | | | |--- class: 0\n",
"|--- feature_0 > 0.50\n",
"| |--- feature_1 <= 0.50\n",
"| | |--- feature_2 <= 0.50\n",
"| | | |--- feature_7 <= 0.50\n",
"| | | | |--- feature_9 <= 0.50\n",
"| | | | | |--- class: 0\n",
"| | | | |--- feature_9 > 0.50\n",
"| | | | | |--- feature_4 <= 0.50\n",
"| | | | | | |--- class: 0\n",
"| | | | | |--- feature_4 > 0.50\n",
"| | | | | | |--- feature_3 <= 0.50\n",
"| | | | | | | |--- class: 0\n",
"| | | | | | |--- feature_3 > 0.50\n",
"| | | | | | | |--- class: 1\n",
"| | | |--- feature_7 > 0.50\n",
"| | | | |--- class: 0\n",
"| | |--- feature_2 > 0.50\n",
"| | | |--- feature_3 <= 0.50\n",
"| | | | |--- feature_9 <= 0.50\n",
"| | | | | |--- feature_8 <= 0.50\n",
"| | | | | | |--- feature_5 <= 0.50\n",
"| | | | | | | |--- feature_6 <= 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | | |--- feature_6 > 0.50\n",
"| | | | | | | | |--- feature_7 <= 0.50\n",
"| | | | | | | | | |--- class: 1\n",
"| | | | | | | | |--- feature_7 > 0.50\n",
"| | | | | | | | | |--- class: 0\n",
"| | | | | | |--- feature_5 > 0.50\n",
"| | | | | | | |--- feature_4 <= 0.50\n",
"| | | | | | | | |--- feature_7 <= 0.50\n",
"| | | | | | | | | |--- class: 0\n",
"| | | | | | | | |--- feature_7 > 0.50\n",
"| | | | | | | | | |--- class: 0\n",
"| | | | | | | |--- feature_4 > 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | |--- feature_8 > 0.50\n",
"| | | | | | |--- class: 0\n",
"| | | | |--- feature_9 > 0.50\n",
"| | | | | |--- class: 0\n",
"| | | |--- feature_3 > 0.50\n",
"| | | | |--- feature_5 <= 0.50\n",
"| | | | | |--- feature_8 <= 0.50\n",
"| | | | | | |--- class: 1\n",
"| | | | | |--- feature_8 > 0.50\n",
"| | | | | | |--- feature_7 <= 0.50\n",
"| | | | | | | |--- feature_6 <= 0.50\n",
"| | | | | | | | |--- class: 1\n",
"| | | | | | | |--- feature_6 > 0.50\n",
"| | | | | | | | |--- feature_9 <= 0.50\n",
"| | | | | | | | | |--- class: 1\n",
"| | | | | | | | |--- feature_9 > 0.50\n",
"| | | | | | | | | |--- class: 0\n",
"| | | | | | |--- feature_7 > 0.50\n",
"| | | | | | | |--- class: 1\n",
"| | | | |--- feature_5 > 0.50\n",
"| | | | | |--- feature_6 <= 0.50\n",
"| | | | | | |--- feature_9 <= 0.50\n",
"| | | | | | | |--- feature_7 <= 0.50\n",
"| | | | | | | | |--- class: 1\n",
"| | | | | | | |--- feature_7 > 0.50\n",
"| | | | | | | | |--- class: 1\n",
"| | | | | | |--- feature_9 > 0.50\n",
"| | | | | | | |--- class: 1\n",
"| | | | | |--- feature_6 > 0.50\n",
"| | | | | | |--- feature_7 <= 0.50\n",
"| | | | | | | |--- feature_4 <= 0.50\n",
"| | | | | | | | |--- class: 1\n",
"| | | | | | | |--- feature_4 > 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | |--- feature_7 > 0.50\n",
"| | | | | | | |--- feature_4 <= 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | | |--- feature_4 > 0.50\n",
"| | | | | | | | |--- class: 1\n",
"| |--- feature_1 > 0.50\n",
"| | |--- feature_2 <= 0.50\n",
"| | | |--- feature_6 <= 0.50\n",
"| | | | |--- feature_3 <= 0.50\n",
"| | | | | |--- feature_7 <= 0.50\n",
"| | | | | | |--- feature_5 <= 0.50\n",
"| | | | | | | |--- class: 1\n",
"| | | | | | |--- feature_5 > 0.50\n",
"| | | | | | | |--- feature_9 <= 0.50\n",
"| | | | | | | | |--- feature_4 <= 0.50\n",
"| | | | | | | | | |--- class: 1\n",
"| | | | | | | | |--- feature_4 > 0.50\n",
"| | | | | | | | | |--- class: 0\n",
"| | | | | | | |--- feature_9 > 0.50\n",
"| | | | | | | | |--- class: 1\n",
"| | | | | |--- feature_7 > 0.50\n",
"| | | | | | |--- class: 1\n",
"| | | | |--- feature_3 > 0.50\n",
"| | | | | |--- feature_7 <= 0.50\n",
"| | | | | | |--- feature_8 <= 0.50\n",
"| | | | | | | |--- feature_9 <= 0.50\n",
"| | | | | | | | |--- class: 1\n",
"| | | | | | | |--- feature_9 > 0.50\n",
"| | | | | | | | |--- feature_5 <= 0.50\n",
"| | | | | | | | | |--- class: 1\n",
"| | | | | | | | |--- feature_5 > 0.50\n",
"| | | | | | | | | |--- class: 0\n",
"| | | | | | |--- feature_8 > 0.50\n",
"| | | | | | | |--- class: 1\n",
"| | | | | |--- feature_7 > 0.50\n",
"| | | | | | |--- feature_5 <= 0.50\n",
"| | | | | | | |--- feature_4 <= 0.50\n",
"| | | | | | | | |--- class: 1\n",
"| | | | | | | |--- feature_4 > 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | |--- feature_5 > 0.50\n",
"| | | | | | | |--- feature_4 <= 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | | |--- feature_4 > 0.50\n",
"| | | | | | | | |--- class: 1\n",
"| | | |--- feature_6 > 0.50\n",
"| | | | |--- class: 1\n",
"| | |--- feature_2 > 0.50\n",
"| | | |--- feature_9 <= 0.50\n",
"| | | | |--- feature_8 <= 0.50\n",
"| | | | | |--- class: 1\n",
"| | | | |--- feature_8 > 0.50\n",
"| | | | | |--- feature_4 <= 0.50\n",
"| | | | | | |--- feature_7 <= 0.50\n",
"| | | | | | | |--- class: 1\n",
"| | | | | | |--- feature_7 > 0.50\n",
"| | | | | | | |--- class: 0\n",
"| | | | | |--- feature_4 > 0.50\n",
"| | | | | | |--- class: 1\n",
"| | | |--- feature_9 > 0.50\n",
"| | | | |--- feature_6 <= 0.50\n",
"| | | | | |--- feature_4 <= 0.50\n",
"| | | | | | |--- class: 1\n",
"| | | | | |--- feature_4 > 0.50\n",
"| | | | | | |--- feature_8 <= 0.50\n",
"| | | | | | | |--- feature_7 <= 0.50\n",
"| | | | | | | | |--- class: 1\n",
"| | | | | | | |--- feature_7 > 0.50\n",
"| | | | | | | | |--- feature_5 <= 0.50\n",
"| | | | | | | | | |--- class: 0\n",
"| | | | | | | | |--- feature_5 > 0.50\n",
"| | | | | | | | | |--- class: 1\n",
"| | | | | | |--- feature_8 > 0.50\n",
"| | | | | | | |--- class: 1\n",
"| | | | |--- feature_6 > 0.50\n",
"| | | | | |--- feature_5 <= 0.50\n",
"| | | | | | |--- feature_7 <= 0.50\n",
"| | | | | | | |--- feature_4 <= 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | | |--- feature_4 > 0.50\n",
"| | | | | | | | |--- class: 1\n",
"| | | | | | |--- feature_7 > 0.50\n",
"| | | | | | | |--- class: 1\n",
"| | | | | |--- feature_5 > 0.50\n",
"| | | | | | |--- feature_3 <= 0.50\n",
"| | | | | | | |--- feature_7 <= 0.50\n",
"| | | | | | | | |--- class: 1\n",
"| | | | | | | |--- feature_7 > 0.50\n",
"| | | | | | | | |--- class: 0\n",
"| | | | | | |--- feature_3 > 0.50\n",
"| | | | | | | |--- class: 0\n",
"\n"
]
}
],
"source": [
"print(export_text(clf))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "339916bb",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}